Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data.
نویسندگان
چکیده
Most methods for modeling species distributions from occurrence records require additional data representing the range of environmental conditions in the modeled region. These data, called background or pseudo-absence data, are usually drawn at random from the entire region, whereas occurrence collection is often spatially biased toward easily accessed areas. Since the spatial bias generally results in environmental bias, the difference between occurrence collection and background sampling may lead to inaccurate models. To correct the estimation, we propose choosing background data with the same bias as occurrence data. We investigate theoretical and practical implications of this approach. Accurate information about spatial bias is usually lacking, so explicit biased sampling of background sites may not be possible. However, it is likely that an entire target group of species observed by similar methods will share similar bias. We therefore explore the use of all occurrences within a target group as biased background data. We compare model performance using target-group background and randomly sampled background on a comprehensive collection of data for 226 species from diverse regions of the world. We find that target-group background improves average performance for all the modeling methods we consider, with the choice of background data having as large an effect on predictive performance as the choice of modeling method. The performance improvement due to target-group background is greatest when there is strong bias in the target-group presence records. Our approach applies to regression-based modeling methods that have been adapted for use with occurrence data, such as generalized linear or additive models and boosted regression trees, and to Maxent, a probability density estimation method. We argue that increased awareness of the implications of spatial bias in surveys, and possible modeling remedies, will substantially improve predictions of species distributions.
منابع مشابه
ModEco: an integrated software package for ecological niche modeling
ModEco is a software package for ecological niche modeling. It integrates a range of niche modeling methods within a geographical information system. ModEco provides a user friendly platform that enables users to explore, analyze, and model species distribution data with relative ease. ModEco has several unique features: 1) it deals with different types of ecological observation data, such as p...
متن کاملModel Selection for Mixture Models Using Perfect Sample
We have considered a perfect sample method for model selection of finite mixture models with either known (fixed) or unknown number of components which can be applied in the most general setting with assumptions on the relation between the rival models and the true distribution. It is, both, one or neither to be well-specified or mis-specified, they may be nested or non-nested. We consider mixt...
متن کاملPredictive Risk Mapping of Leptospirosis for North of Iran Using Pseudo-absences Data
Leptospirosis is a common zoonosis disease with a high prevalence in the world and is recognized as an important public health drawback in both developing and developed countries owing to epidemics and increasing prevalence. Because of the high diversity of hosts that are capable of carrying the causative agent, this disease has an expansive geographical reach. Various environmental and social ...
متن کاملConsistent Estimation in Pseudo Panels in the Presence of Selection Bias
In the presence of selection bias the traditional estimators for pseudo panel data models are inconsistent. This paper discusses a method to achieve consistency in static linear pseudo panels in the presence of selection bias and a testing procedure for sample selection bias. The authors’ approach uses a bias correction term proportional to the inverse Mills ratio with argument equal to the “no...
متن کاملNovel Three-Step Pseudo-Absence Selection Technique for Improved Species Distribution Modelling
Pseudo-absence selection for spatial distribution models (SDMs) is the subject of ongoing investigation. Numerous techniques continue to be developed, and reports of their effectiveness vary. Because the quality of presence and absence data is key for acceptable accuracy of correlative SDM predictions, determining an appropriate method to characterise pseudo-absences for SDM's is vital. The mai...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Ecological applications : a publication of the Ecological Society of America
دوره 19 1 شماره
صفحات -
تاریخ انتشار 2009